Language/OS - Multiplatform Resource Library

home *** CD-ROM | disk | FTP | other *** search

/ Language/OS - Multiplatform Resource Library / LANGUAGE OS.iso / cocktail / doc.lha / doc.doc / intro.doc < prev next >

Wrap

Text File | 1992-09-25 | 37KB | 1,335 lines

___________________________________________________________________ Toolbox Introduction J. Grosch ___________________________________________________________________ ___________________________________________________________________ GESELLSCHAFT FUeR MATHEMATIK UND DATENVERARBEITUNG MBH FORSCHUNGSSTELLE FUeR PROGRAMMSTRUKTUREN AN DER UNIVERSITAeT KARLSRUHE ___________________________________________________________________ Project Compiler Generation ___________________________________________________________ Toolbox Introduction Josef Grosch Aug. 3, 1992 ___________________________________________________________ Report No. 25 Copyright c 1992 GMD Gesellschaft fuer Mathematik und Datenverarbeitung mbH Forschungsstelle an der Universitaet Karlsruhe Vincenz-Priesznitz-Str. 1 D-7500 Karlsruhe 1 Toolbox Introduction Abstract This document introduces into the usage of the Karlsruhe Toolbox for Compiler Construction. It should be read by those who effectively want to use the toolbox as first document. Those who want to learn about the toolbox and its contents in general are referred to the document "A Toolbox for Compiler Con- struction". This document gives an overview about the documentation of the toolbox. It describes how the individual tools interact in order to generate a complete compiler. The general structure of a makefile to control the tools is dis- cussed. 1. Document Overview The documentation of the Karlsruhe Toolbox for Compiler Construction consists of separate user's manuals for the individual tools and additional papers Table 1: Document Set ________________________________________________________________________________________________ Filename Title Pages ________________________________________________________________________________________________ intro Toolbox Introduction 12 toolbox A Tool Box for Compiler Construction 11 werkzeuge Werkzeuge fuer den Uebersetzerbau 12 reuse Reusable Software - A Collection of Modula-2-Modules 24 reuseC Reusable Software - A Collection of C-Modules 12 prepro Preprocessors 14 rex Rex - A Scanner Generator 32 scanex Selected Examples of Scanner Specifications 21 scangen Efficient Generation of Table-Driven Scanners 15 lalr-ell The Parser Generators Lalr and Ell 43 lalr Lalr - A Generator for Efficient Parsers 22 ell Efficient and Comfortable Error Recovery in Recursive Descent Parsers 15 highspeed Generators for High-Speed Front-Ends 12 autogen Automatische Generierung effizienter Compiler 10 ast Ast - A Generator for Abstract Syntax Trees 36 toolsupp Tool Support for Data Structures 11 ag Ag - An Attribute Evaluator Generator 27 ooags Object-Oriented Attribute Grammars 10 multiple Multiple Inheritance in Object-Oriented Attribute Grammars 10 puma Puma - A Generator for the Transformation of Attributed Trees 29 trafo Transformation of Attributed Trees Using Pattern Matching 16 minilax Specification of a MiniLAX-Interpreter 35 begmanual BEG - a Back End Generator - User Manual 71 mtc Entwurf und Implementierung eines Uebersetzers von Modula-2 nach C 105 estra Spezifikation und Implementierung der Transformation attributierter Baeume 79 ________________________________________________________________________________________________ 2 describing further aspects such as implementation details, examples, and applications. The documents are written in English. For a few of them there are German versions as well. Only the two master thesis about estra and mtc exist in German, only. 1.1. Format The documents exist in two formats: Postscript and troff. These are dis- tinguished by the file suffixes .ps and .me. The troff files need processing with pic and the device independent version of troff called ditroff using me- macros by commands like pic | tbl | eqn | ditroff -me Depending on the format the documents are located in the directories doc.ps or doc.me. 1.2. Documents Table 1 lists the titles of the documents, the corresponding filenames (without suffix), and the number of pages. 1.3. Outlines In the following the contents of every document is outlined shortly: Toolbox Introduction An introduction for effective users of the toolbox which should be con- sulted first. It gives an overview about the document set and describes how the tools interact. A Tool Box for Compiler Construction Explains the contents of the toolbox and the underlying design. The indi- vidual tools are sketched shortly and some application experiences are reported. Werkzeuge fuer den Uebersetzerbau A German version of the previous document. Reusable Software - A Collection of Modula-2-Modules Describes a library of general routines written in Modula-2 which are oriented towards compiler construction. The output of some tools has to be linked with this library. Reusable Software - A Collection of C-Modules Describes a library of general routines written in C which are oriented towards compiler construction. The output of some tools has to be linked with this library. Preprocessors Describes several preprocessors for the extraction of scanner specifica- tions out of parser specifications and for the conversion of lex/yacc input to rex/lalr input or vice versa. There are seven preprocessors: 3 cg -xz converts an attribute grammar to lalr input rpp combines a grammar and a scanner specification to rex input l2r converts lex input to rex input y2l converts yacc input to lalr input r2l converts rex input to lex input cg -u converts an attribute grammar to yacc input bnf converts a grammar from EBNF to BNF Rex - A Scanner Generator The user's manual for the scanner generator rex. Selected Examples of Scanner Specifications A collection of scanner specifications for rex dealing mostly with patho- logical cases. Efficient Generation of Table-Driven Scanners Describes internals of the scanner generator rex like the so-called tun- nel automaton and the linear time algorithm for constant regular expres- sions. The Parser Generators Lalr and Ell The user's manual for the LALR(1) parser generator lalr and the LL(1) parser generator ell. It describes among other things the input language common to both generators. Lalr - A Generator for Efficient Parsers Describes details of the implementation of the parsers generated by lalr and further outstanding features of this tool. Efficient and Comfortable Error Recovery in Recursive Descent Parsers Describes the implementation of the parsers generated by ell especially with respect to automatic error recovery. Generators for High-Speed Front-Ends A summary of the highlights of the scanner and parser generators and a comparison to lex/yacc and flex/bison. Automatische Generierung effizienter Compiler A German version of the previous document. Ast - A Generator for Abstract Syntax Trees The user's manual of ast, a tool supporting the definition and manipula- tion of attributed trees and graphs. Tool Support for Data Structures Also describes ast, but less precise than the previous document. Ag - An Attribute Evaluator Generator The user's manual of ag, a generator for evaluators of ordered attribute grammars (OAG). 4 Object-Oriented Attribute Grammars Also describes ag, like the previous document, with emphasis on the object-oriented features. Multiple Inheritance in Object-Oriented Attribute Grammars Extends the object-oriented attribute grammars described in the previous document to multiple inheritance. Spezifikation und Implementierung der Transformation attributierter Baeume Diploma thesis in German about the design and implementation of estra, a generator for the transformation of attributed trees. Puma - A Generator for the Transformation of Attributed Trees The user's manual of puma, a tool for the transformation of attributed trees which is based on pattern matching and unification. Transformation of Attributed Trees Using Pattern Matching Also describes puma using a more introductory style and compares it to similar tools. Specification of a MiniLAX-Interpreter The annotated input to generate a compiler for the example language Mini- LAX. BEG - a Back End Generator - User Manual The user's manual of the back-end-generator beg. Entwurf und Implementierung eines Uebersetzers von Modula-2 nach C Diploma thesis in German about the design and implementation of the Modula-to-C translator mtc. For readers intending to use the tools the following documents are of primary interest: intro Toolbox Introduction toolbox A Tool Box for Compiler Construction prepro Preprocessors rex Rex - A Scanner Generator lalr-ell The Parser Generators Lalr and Ell ast Ast - A Generator for Abstract Syntax Trees ag Ag - An Attribute Evaluator Generator puma Puma - A Generator for the Transformation of Attributed Trees begmanual BEG - a Back End Generator - User Manual Of secondary interest might be: reuse Reusable Software - A Collection of Modula-2-Modules scanex Selected Examples of Scanner Specifications Fig. 1: Compiler Structure 5 The other documents either describe internals of the tools or are excerpts of the above. 2. Generating a Compiler A compiler usually consists of several modules where every module handles a certain task. The toolbox gives very much freedom for the design of a com- piler and supports various structures. Figure 1 presents our preferred compiler structure. In the right column are the main modules that constitute a compiler. The left column contains the necessary specifications. In between there are the tools which are controlled by the specifications and which produce the modules. The arrows represent the data flow in part during generation time and in part during run time of the compiler. In principle the compiler model works as follows: A scanner and a parser read the source, check the concrete syntax, and construct an abstract syntax tree. They may perform several normalizations, simplifications, or transforma- tions in order to keep the abstract syntax relatively simple. Semantic analysis is performed on the abstract syntax tree. Optionally attributes for code generation may be computed. Afterwards the abstract syntax tree is transformed into an intermediate representation. The latter is the input of the code generator which finally produces the machine code. The picture in Figure 1 is relatively abstract by just listing the main tasks of a compiler. Every task is generated by a tool out of a separate specification which is oriented towards the problem at hand. The generation processes seem to be independent of each other. For a real user a more closer look than the one of Figure 1 is necessary. Figure 2 describes the actual interaction among the tools. It describes the data flow starting from specifications and ending in an executable compiler. Boxes represent files, circles represent tools, and arrows show the data flow. The input specifications are contained in the files at the left-hand side. The tools generate modules containing source code in the implementation languages C or Modula-2. This modules are shown at the right-hand side. Every module conists of two files with the following suffixes: implementation language C: .h header or interface file .c implementation part implementation language Modula-2: .md definition module .mi implementation module Fig. 2: Interaction and data flow among the tools 6 Files outside the left- and right-hand side columns contain intermediate data. The various kinds of information in the files are distinguished by dif- ferent file types as explained in Table 2. The few dependencies between tools are shown by the data flow via intermediate files. These dependencies are explained in more detail in the next sections. Table 2: File Types ______________________________________________________________________ Suffix Meaning ______________________________________________________________________ .pars scanner and parser specification (including S-attribution) .scan rest of scanner specification .rpp intermediate data: scanner description extracted from .pars .rex scanner specification understood by rex .lalr parser specification understood by lalr .ell input for ell (= input for lalr with EBNF constructs) .cg input for ast and ag .puma input for puma .ST intermediate data: storage assignment for attributes .TS intermediate data: description of attributed tree ______________________________________________________________________ .h C source: header or interface file .c C source: implementation part .md Modula-2 source: definition module .mi Modula-2 source: implementation module .exe compiled and linked executable compiler ______________________________________________________________________ 2.1. Scanning and Parsing Two parser generators are contained in the toolbox. First, the user has to decide which one to use. I will not start arguing here in favour of one or the other grammar class. If I am asked, I recommend to use lalr. Both parser gen- erators and their common input language (types .lalr and .ell) are documented in "The Parser Generators Lalr and Ell". From the syntactic point of view both tools understand almost the same input language. The only incompatibility concerns the different notation to access attributes. From the semantic point of view there are of course differences with respect to the grammar class and the kind of attribution evaluable during parsing. Whereas lalr accepts LALR(1) grammars and is able to evaluate and S-attribution (SAG), ell accepts LL(1) grammars and is able to evaluate an L-attribution (LAG). Both, lalr and ell generate a module with the default name Parser which serves as basename for the file name, too. This module name can be chosen freely using an appropriate directive in the input of the parser generators. Both parser generators can also supply a module called Errors. This is a simple prototype for handling error messages that just prints the messages. However, it is recommended to use the more comfortable module Errors from the library reuse. In simple cases, this module is just linked to the user's program. If modifications are necessary this module should be copied from the library along with its compan- ion module Positions into the user's directory. The module Positions defines a data structure to describe source positions. 7 The scanner generator rex and its input language (type .rex) are docu- mented in "Rex - A Scanner Generator". rex generates a module with the default name Scanner which serves as basename for the file name, too. rex can also generate a module called Source which isolates the task of providing input for the scanner. By default it reads from a file or from standard input. Again, it is recommended to use the module Source from the library reuse. In simple cases, this module is just linked to the user's program. If modifications are necessary or the module should provide input for a scanner with a name different to Scanner then this module must be requested from rex. It is possible to combine several scanners and parsers either generated by lalr or ell into one program as long as different module names are chosen. If the parser generator ell is to be used, the inputs of rex and ell have to be specified in the languages directly understood by these tools (types .rex and .ell). If the parser generator lalr is to be used, a more comfort- able kind of input language is available. It is possible to extract most of a scanner specification out of a parser grammar. Therefore it is recommended to specify scanner and parser by two files of types .scan and .pars. Further advantageous of this approach are that concrete syntax, abstract syntax, and attribute grammar are written in one common language (types .pars and .cg) and that the attribution to be evaluated during parsing is written using named attributes. This attribution is checked for completeness and whether it obeys the SAG property. The language to describe concrete and abstract syntax is documented in: "Ast - A Generator for Abstract Syntax Trees". The addition of attribute declarations and attribute computations are documented in: "Ag - An Attribute Evaluator Generator". The use of this language especially as input for scanner and parser generation is documented in: "Preprocessors". This document also describes the preprocessors cg -xz and rpp. cg -xz converts input of type .pars into input directly understood by lalr (type .lalr) and it extracts most of the scanner specification which is written on the intermedi- ate file named Scanner.rpp. The rex preprocessor rpp merges this extracted scanner specification with additional parts provided by the user (type .scan) and produces input directly understood by rex. The language in files of type .scan is an extension of the input language of rex. These extensions are also documented in: "Preprocessors". 2.2. Semantic Analysis and Transformation Our preferred compiler design constructs an abstract syntax tree as underlying data structure for semantic analysis. Afterwards this tree is usually mapped to some kind of intermediate language by a phase termed transformation. The syntax tree as central data structure is managed by the module Tree. This module is generated by the tool ast out of a specification in a file of type .cg. The tool ast and its input language are documented in: "Ast - A Generator for Abstract Syntax Trees". The construction of trees is usually done during parsing. It is specified within the semantic actions of the input of the parser generator. One possibility for the specification of semantic analysis is the use of an attribute grammar. The tool ag generates an evaluator module (named Eval by default) out of an attribute grammar. As this tool also has to know the 8 Table 3: Library Units Needed by Generated Modules ___________________________________________________________________________________ Tool Module C Modula-2 ___________________________________________________________________________________ rex Source System System rex Scanner System, Source System, Checks, Memory, Strings, IO, Position, Source lalr Parser Memory, DynArray, Sets, System, DynArray, Sets, Strings Errors Positions, Errors lalr Errors System, Sets, Idents, System, Strings, Idents, Sets, IO, Positions Positions ell Parser Errors System, Strings, Positions, Errors ell Errors System, Sets, Idents, System, Strings, Idents, Sets, IO, Positions Positions reuse Errors System, Memory, Sets, System, Memory, Strings, StringMem, Idents, Positions Idents, Sets, IO, Positions, Sort ast Tree System, General, Memory, System, General, Memory, DynArray, DynArray, StringMem, IO, Layout, StringMem, Strings, Idents, Sets, Positions Idents, Texts, Sets, Positions ag Eval - - puma Trafo System System, IO ___________________________________________________________________________________ structure of the abstract syntax tree both, ast and ag usually process the same input file. The tool ag and the extensions to the input language of ast for attribute grammars are documented in: "Ag - An Attribute Evaluator Genera- tor". The optimizer of ag decides how to implement attributes. They can be either stored in the tree or in global stacks or global variables. This infor- mation is communicated from ag to ast in files of type .ST. (This feature is not implemented yet.) The tool puma generates transformers (named Trafo by default) that map attributed trees to arbitrary output. As this tool also has to know about the structure of the tree this information is communicated from ast to puma via a file of type .TS. The tool puma and its input language are documented in: "Puma - A Generator for the Transformation of Attributed Trees". The names of the modules produced by the tools ast, ag, and puma can be controlled by directives in the input. Figure 2 uses the default names. By chosing different names it is possible to combine several tree modules, attri- bute evaluators, and transformers in one program. 2.3. Compiling and Linking All the source modules generated by the tools have to be compiled by a com- piler appropriate for the implementation language (C or Modula-2). Additional hand-written modules can be added as necessary. In Figure 2 the module Support indicates this possibility. In the last step all binaries have to be linked 9 Table 4: Modules in the Library Reuse ____________________________________________________________________________________ Module Task C Modula-2 ____________________________________________________________________________________ Memory dynamic storage (heap) with free lists y y Heap dynamic storage (heap) without free lists - y DynArray dynamic and flexible arrays y y Strings string handling - y StringMem string memory y y Idents identifier table - unambiguous encoding of strings y y Lists lists of arbitrary objects - y Texts texts are lists of strings (lines) - y Sets sets of scalar values (without run time checks) y y SetsC sets of scalar values (with run time checks) - y Relations binary relations between scalar values - y IO buffered input and output - y StdIO buffered IO for standard files - y Layout more routines for input and output - y Positions handling of source positions y y Errors error handler for parsers and compilers y y Source provides input for scanners y y Sort quicksort for arrays with elements of arbitrary type - y System interface to the operating system y y General miscellaneous functions y y ____________________________________________________________________________________ together with a few modules of the library reuse to yield an executable com- piler. The use of modules from the library reuse depends on the implementation language and the used tools. There is a C and a Modula-2 version of this library. The Modula-2 version is documented in: "Reusable Software - A Col- lection of Modula-2-Modules". The C version is documented in: "Reusable Software - A Collection of C-Modules". Table 3 lists the library units needed by tool generated modules. Additionally the user may engage further modules from this library for various tasks. Table 4 lists the modules that might be of interest. The right-hand side columns describe the availability of the modules with regard to the implementation language. 3. Makefile The tools of the toolbox are conveniently controlled by the UNIX program make and an appropriate makefile - at least under the UNIX operating system. This eases the invocation of the tools and minimizes the amount of regeneration after changes. Figure 2 can be used to derive a makefile because it describes most of the dependencies among tools and files. The following makefiles con- trol the generation of compilers for the example language MiniLAX in the tar- get languages C and Modula-2. The annotated specification of this language is documented in: "Specification of a MiniLAX-Interpreter". The makefiles are examples a user can start with. 10 The makefiles in the next sections deviate in the following from the fun- damental structure presented in Figure 2: - The attribute evaluator module is called Semantics instead of Eval. - The transformation module is called ICode instead of Trafo. - The hand-written modules are called ICodeInter and minilax. The latter constitutes the main program. - There are two support modules for semantic analysis called Definitions and Types. These are generated by tools (ast/cg and puma), too. 3.1. C The macro LIB specifies the directory where the compiled library reuse is located. The name of this library is libreuse.a. The -I flag in the macro CFLAGS specifies the directory where the header files of the library modules are located. The first command (target minilax) is for linking the compiled modules to an executable compiler. The succeeding entries describe the dependencies dur- ing the generation phase and the invocation of the tools. The dependencies before the target lint are those needed during compilation. LIB = $(HOME)/lib INCDIR = $(LIB)/include CFLAGS = -I$(INCDIR) CC = cc SOURCES = Scanner.h Scanner.c Parser.h Parser.c Tree.h Tree.c \ Semantics.h Semantics.c Types.h Types.c Definitions.h Definitions.c \ ICode.h ICode.c ICodeInter.h ICodeInter.c minilax.c BINS = minilax.o Scanner.o Parser.o Tree.o \ Types.o Definitions.o Semantics.o ICode.o ICodeInter.o 11 minilax: $(BINS) $(CC) $(CFLAGS) $(BINS) $(LIB)/libreuse.a -o minilax -lm Scanner.rpp Parser.lalr: minilax.pars cg -cxzj minilax.pars; minilax.rex: minilax.scan Scanner.rpp rpp < minilax.scan > minilax.rex; Scanner.h Scanner.c: minilax.rex rex -cd minilax.rex; Parser.h Parser.c: Parser.lalr lalr -c -d Parser.lalr; Tree.h Tree.c: minilax.cg cg -cdimRDI0 minilax.cg; Semantics.h Semantics.c: minilax.cg cg -cDI0 minilax.cg; Definitions.h Definitions.c Definitions.TS: Definitions.cg cg -cdim4 Definitions.cg; Tree.TS: minilax.cg echo SELECT AbstractSyntax Output | cat - minilax.cg | cg -c4 Types.h Types.c: Types.puma Tree.TS puma -cdipk Types.puma; ICode.h ICode.c: ICode.puma Tree.TS Definitions.TS puma -cdi ICode.puma; Parser.o: Parser.h Scanner.h Tree.h Types.h Definitions.h Semantics.o: Semantics.h Tree.h Definitions.h Types.h Tree.o: Tree.h Definitions.o: Definitions.h Tree.h Types.o: Tree.h Types.h ICode.o: Tree.h Types.h ICodeInter.h minilax.o: Scanner.h Parser.h Tree.h Semantics.h Definitions.h ICode.h \ ICodeInter.h Types.o lint: $(SOURCES) lint -I$(HOME)/reuse/c -u *.c clean: rm -f Scan*.? Parser.? Tree.? Sema*.? Defi*.? Types.? ICode.? *.TS rm -f core _Debug minilax *Tab mini*.rex Parser.lalr Scan*.rpp yy*.w *.o .c.o: $(CC) $(CFLAGS) -c $*.c; 12 3.2. Modula-2 The first command (target minilax) describes compilation and linking using the GMD Modula-2 compiler MOCKA. This compiler does its own dependency analysis among the sources. Therefore the makefile does not contain any dependency descriptions between sources and binaries. The -d flag of the compiler call mc specifies the directory where the library reuse is located. The rest of the makefile describes the generation phase and the invocation of the tools. SOURCES = Scanner.md Scanner.mi Parser.md Parser.mi \ Tree.md Tree.mi Semantics.md Semantics.mi \ Types.md Types.mi Definitions.md Definitions.mi \ ICode.md ICode.mi ICodeInter.md ICodeInter.mi minilax.mi minilax: $(SOURCES) echo p minilax | mc -d ../../reuse/src Scanner.rpp Parser.lalr: minilax.pars cg -xzj minilax.pars; minilax.rex: minilax.scan Scanner.rpp rpp < minilax.scan > minilax.rex; Scanner.md Scanner.mi Scanner.Tab: minilax.rex rex -d minilax.rex; Parser.md Parser.mi Parser.Tab: Parser.lalr lalr -d Parser.lalr; Tree.md Tree.mi: minilax.cg cg -dimRDI0 minilax.cg; Semantics.md Semantics.mi: minilax.cg cg -DI0 minilax.cg; Definitions.md Definitions.mi Definitions.TS: Definitions.cg cg -dim4 Definitions.cg; Tree.TS: minilax.cg echo SELECT AbstractSyntax Output | cat - minilax.cg | cg -4 Types.md Types.mi: Types.puma Tree.TS puma -dipk Types.puma; ICode.md ICode.mi: ICode.puma Tree.TS Definitions.TS puma -di ICode.puma; clean: rm -f Scan*.m? Parser.m? Tree.m? Sema*.m? Defi*.m? Types.m? ICode.m? rm -f core *.TS *.[dimor] _Debug minilax *Tab mini*.rex Parser.lalr Scan*.rpp 1 Contents Abstract ........................................................ 1 1. Document Overview ............................................... 1 1.1. Format .......................................................... 2 1.2. Documents ....................................................... 2 1.3. Outlines ........................................................ 2 2. Generating a Compiler ........................................... 5 2.1. Scanning and Parsing ............................................ 6 2.2. Semantic Analysis and Transformation ............................ 7 2.3. Compiling and Linking ........................................... 8 3. Makefile ........................................................ 9 3.1. C ............................................................... 10 3.2. Modula-2 ........................................................ 12